Skip to content

feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split#2286

Draft
philwinder wants to merge 41 commits into
mainfrom
feat/helix-org-prompt-driven-mcp
Draft

feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split#2286
philwinder wants to merge 41 commits into
mainfrom
feat/helix-org-prompt-driven-mcp

Conversation

@philwinder
Copy link
Copy Markdown
Member

@philwinder philwinder commented Apr 25, 2026

Summary

Introduces helix-org, a standalone Go prototype for a hybrid human/AI organization system. This PR is a WIP/Draft collecting the core infrastructure, three transport implementations, MCP prompts (slash commands), and a set of runnable demos.

Core platform

  • Model Context Protocol (MCP) Integration: All mutations flow through MCP endpoints at /workers/{id}/mcp using Streamable HTTP transport. Tool visibility is grant-filtered per worker.

  • MCP Prompts (Slash Commands): Server-defined prompts registered in the MCP surface alongside tools. Each prompt has a name, title, description, arguments, and a render method that produces seed messages. Grant-gated (a prompt requires a tool to be visible). Auto-generated /help command that walks the registry at render time — new prompts automatically appear without manual updates. /role command drafts a new Role from a title hint, expands to full interview template, saves via create_role, then offers edits or chains to hire_worker.

  • Chat Typeahead: UI dropdown showing available slash commands on every keyup in the chat textarea. Server-side expansion in the chat bridge: SendHandler intercepts /name inputs, expands them from template before sending to claude. User sees original input in their bubble; claude gets the expanded text. Enables interactive discovery and reduces friction.

  • Enum Schema Hints: WorkerKind and TransportKind surface as enums in the JSON Schema that MCP clients see, enabling better autocomplete. Validation errors are self-documenting: unknown worker kind "foo" (valid: "human", "ai") so clients can self-correct.

  • Prompt-Driven CLI: New helix-org prompt subcommand spawns Claude Code with inline MCP configuration, enabling natural-language orchestration of the entire organization graph (Roles, Workers, Positions, Streams, Grants).

  • Role vs Worker Split: Separates the job (Role: owner-edited markdown, fanned out via update_role) from the person (Worker: per-hire identity, immutable). Allows live edits to job descriptions without touching identities.

  • Environment Provisioning & Push Dispatch: Each Worker gets an isolated environment directory. When events land on subscribed Streams, the system spawns a fresh Claude Code activation (one-shot) with that worker's MCP endpoint. Role and identity are stamped into the environment; the agent reads them and acts on the event trigger.

  • Canonical Message envelope: Every Event.Body is a domain.Message JSON (From / To / Subject / Body / ThreadID / InReplyTo / MessageID / Extra). The spawner renders every populated field into the activation prompt so Workers branch on transport-shaped metadata directly, without a separate read_events round-trip.

  • Simplified Grant Model: Grants are strictly (WorkerID, ToolName) pairs with no enforcement/scope logic. A grant is the permission; the agent is trusted to comply.

Transports

Streams own their I/O. Three transport kinds, each behind its own package:

  • Local (default): in-process pub/sub between Workers.
  • Webhook: bidirectional HTTP. Outbound POSTs to a configured URL on every published event; inbound deliveries are HMAC-verified and fanned out to subscribed Workers. Demo: secretary worker bridges an external webhook to internal channels.
  • Email (Postmark): outbound via Postmark API; inbound via Postmark's webhook with alias-based stream routing. Demo: two-worker email exchange (Sam <-> Lee).
  • GitHub (inbound only): single /github/webhook endpoint, HMAC-verified via X-Hub-Signature-256, fans out to every Stream whose repo + events whitelist matches. Acting on a repo (label, comment, review, open PR) is the Worker's job via gh in its Environment; publish on a github stream returns a loud error. Demos: doc-engineer reviews docs PRs and tags docs issues; github-engineer implements features on a GitHub Project v2 board.

Operational config

  • DB-stored, redacted-by-default: provider credentials live in transport.<kind> keys with explicit Secrets: []string declarations. helix-org config get redacts every declared secret; regression tests pin the spec for both transport.postmark and transport.github so a future refactor can't silently drop a redaction entry.

Design Philosophy

  • Data/text over code: Configuration lives in Role markdown and prompts, not Go logic.
  • Keep core generic: Tools define their own scope and schemas; new tools are addable without core changes.
  • No workflow in code: Orchestration logic lives in Role prompts, not implicit chains in the codebase.
  • Smallest thing that works: No speculative abstractions.

What's Inside

  • domain/: Core types (Role, Worker, Position, Stream, Grant, Event, Message, Transport) + enum validators
  • prompts/: Prompt interface, Registry, builtins (/help, /role)
  • store/sqlite/: GORM-driven SQLite with AutoMigrate (no raw SQL migrations)
  • tools/: 13 MCP tools + spawner + registry + JSON schema enum hints
  • server/: HTTP endpoints for reads + MCP mutation handler + jsonapi.org serialization + chat bridge with slash expansion
  • cmd/helix-org/: CLI with serve, bootstrap, chat, config subcommands
  • broadcast/ & dispatch/: Event bus for push-based worker activation
  • transports/postmark, transports/github: provider-specific I/O packages
  • demos/: getting-started, newsroom, webhook, email, github, github-engineer - runnable end-to-end
  • design/: design docs for the canonical envelope, the email transport, the github transport

Testing

All code is tested end-to-end:

  • Bootstrap -> role create -> worker hire -> event publish -> worker activation with MCP -> live-edit role -> behavior change
  • Prompt registry auto-generation (Help sees new prompts registered after it)
  • Chat slash expansion and typeahead filtering
  • Enum schema and validation error formatting
  • Transport unit tests for HMAC verification, payload mapping, redaction
  • make check passes: 0 lint issues, race detector clean

Next Steps (Post-WIP)

  • Add persistent authentication (currently all callers are treated as root owner)
  • Move provider credentials to per-Worker scope so different teams can use different GitHub identities / inboxes
  • Extend to support human operators at the REPL
  • Integrate with the broader Helix platform

WIP because: the core prototype is complete and tested, but we're still validating the design with the broader team before finalizing the API surface and documentation.

Co-Authored-By: Claude Haiku 4.5 noreply@anthropic.com
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com


Update — domain/runtime split + unified Helix session shape

  • Helix-specific Worker fields moved off domain.Worker to a sidecar WorkerRuntimeState keyed on (workerID, backend, key). Six methods dropped from the domain interface.
  • Runtime layer moved out of tools/: new agent/, agent/claude/, agent/helix/ packages plus helix/helixclient/. tools/ now holds only org-graph MCP tools.
  • SpecsPublisher -> agent.WorkspaceSync. Logical-name contract (role.md, identity.md); each backend translates to its own layout. Fixes the prior path mismatch where update_role wrote job/* but the activation mandate read .context/*.
  • agent.md moved from tools/templates/ to agent/policy.md and embedded as agent.Policy so both runtimes share one source.
  • Unified Helix session shape: helix.Runtime (zed_agent) and helix.AgentType (zed_external) are non-configurable constants used by every project apply and every /sessions/chat post. Drops chat.agent_type config key and the Runtime fields on the spawner/applier so the spawner and chat backend can no longer drift to claude_code.

Verified end-to-end against app.helix.ml (getting-started demo).


Demos

The PR now includes seven runnable end-to-end demos:

  1. getting-started — bootstrap, hire echo worker, publish/read events, live edit role.
  2. webhook — inbound/outbound webhook transport, secretary summarizes and forwards.
  3. email — bidirectional Postmark, two-worker support escalation with threading.
  4. newsroom — multi-worker publishing pipeline (editor, fact-checker, publisher).
  5. github — GitHub webhook inbound, multiple workers acting on issues/PRs via gh CLI.
  6. github-engineer — GitHub Project v2 board worker implementing features spec-style.
  7. manufacturing — NCR triage with Helix backend + comms-demo mock-channels: operator raises NCR → agent fans out (Slack/SMS/Email) → supervisor approves → agent confirms. Shows the hold pattern and the agent/human split.

Notes for reviewers

Manufacturing demo is the newest and was verified end-to-end against app.helix.ml:

  • Uses Helix-backed spawner + chat (not local claude).
  • Three webhook streams (supervisor DM, customer SMS, supplier email).
  • Role file bakes reference data (SPC, maintenance log, related NCRs, affected orders) so no external systems needed.
  • Two agent activations: NCR raised → fan out; supervisor reply → confirm & conditional send.
  • ~90 seconds on stage, pre-flight & setup ~5 minutes.
  • Demonstrates the core value: agent assembles evidence and drafts; humans make three decisions (not chase data across seven systems).

All demos pass make ci (formatting, lint, race tests).

@philwinder philwinder force-pushed the feat/helix-org-prompt-driven-mcp branch 2 times, most recently from d9a9c99 to 01e9388 Compare April 27, 2026 13:23
@philwinder philwinder changed the title feat: helix-org prototype with MCP, prompt-driven CLI, and Role/Identity split feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split Apr 28, 2026
philwinder and others added 27 commits May 4, 2026 11:43
…ity split

Adds a complete proto-implementation of helix-org as a standalone Go project with:

- **MCP Integration**: All mutations flow through Model Context Protocol at /workers/{id}/mcp
  using Streamable HTTP transport. Tool list is grant-filtered per worker.

- **Prompt-Driven CLI**: New `helix-org prompt` subcommand spawns Claude Code with inline
  MCP config, enabling natural-language orchestration of the entire org graph.

- **Role vs Worker Split**: Roles are job descriptions (owner-edited markdown, fanned out
  via update_role). Workers are people in positions (per-hire identities, immutable).

- **Environment Provisioning**: Each Worker gets an isolated environment directory with:
  - role.md (propagated via update_role)
  - identity.md (per-hire, immutable)
  - agent.md (fixed stub: "Read role.md and identity.md, act on trigger")
  - mcp.json (dynamically generated per activation)

- **Push-Dispatch Event Loop**: When events land on subscribed channels, the system spawns
  a fresh Claude Code instance (one-shot activation) with that worker's MCP endpoint.

- **channel_members Tool**: Read-only MCP tool that lists workers subscribed to a channel,
  enabling Workers to query org membership without side effects.

- **Simplified Grant Model**: Grants are now strictly (workerID, toolName) pairs. Removed
  enforcement/scope entirely—a grant IS the permission, and the agent is trusted to comply.

- **Humanized Demos**: Getting-started and newsroom demos now use prompt-based CLIs with
  natural-language orchestration instead of raw API calls.

Major components:
- domain/: Core types (Role, Worker, Position, Channel, Grant, Event)
- store/sqlite: GORM-driven SQLite storage with AutoMigrate
- tools/: 13 MCP tools (create_role, hire_worker, etc.) + spawner
- server/: HTTP endpoints + MCP handler + jsonapi.org serialization
- cmd/helix-org: CLI with serve, bootstrap, prompt subcommands
- broadcast/dispatch: Event bus for push-based activation
- demos/: Two runnable examples (getting-started, newsroom editorial team)

Design principles embedded:
- Prefer data/text over code (config in Role markdown, not Go)
- Keep core generic (tools define their own scope and schemas)
- No workflow in code (agents orchestrate via prompts, not implicit chains)
- Write smallest thing that works (no speculative abstractions)

All code tested end-to-end: bootstrap → role create → worker hire → event publish →
worker activation with MCP → live-edit role → behaviour change on next activation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
A minimal three-Worker demo that produces an opinionated MLOps
newsletter with a fresh angle each issue. Shows the prompt-driven
philosophy at its tightest:

- Only files on disk are 3 short role markdown files (~25 lines each)
- A single helix-org prompt call creates the roles, positions,
  channels, and hires the team
- Editor picks the angle, researcher hunts for matching news,
  journalist crafts the narrative
- Re-run with a different brief and the same team produces a
  completely different angle on the same broad subject

Tested end-to-end: two briefs produced two distinct angles
("platform team tax" vs "feature stores as MLOps' open secret
graveyard") with named subjects (Stitch Fix, Chime, Modal Labs,
Tecton) — proving the angle truly varies per brief.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Adds a new \`helix-org tail [glob...]\` CLI plus the \`GET /tail\`
endpoint it talks to. Lets the human watch the cascade of a running
team in real time without curl + jq incantations.

- Defaults to '*' (all channels). Globs use Go's path.Match:
  'c-*', 'c-news?', 'c-newsletter'. Multiple globs unioned.
- Long-polls (default 30s wait, configurable via --wait).
- Pretty output: HH:MM:SS  channel  source  body, with subsequent
  body lines indented under the body column. ANSI colour when
  stdout is a TTY; --no-color to disable.
- New broadcast.Broadcaster.SubscribeAll for wildcard wakes, so
  channels created mid-tail (e.g. by an editor's hire trigger)
  also wake the tail loop.
- New store.Events.ListSince(channelIDs, since, limit) returning
  oldest-first events strictly newer than the named event.
- URL surface designed to extend: bare globs are channel IDs
  today; future namespace prefixes (channel:c-*, activation:w-*)
  can be added without breaking compatibility.

Tested: store + broadcaster unit tests, server endpoint test
covering glob match, since cursor, and default match. Live-tested
against the running mlops-newsletter demo (history backfill, live
event arrival via long-poll, multi-glob union).

Newsletter README updated to use \`helix-org tail\` instead of curl.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Both demos previously asked the user to either tail per-Worker
activation.log files or curl the channel events endpoint. Replace
both with helix-org tail:

- newsroom: drop "tile seven terminals" instruction in favour of one
  tail window (default '*' = all channels). Recommend per-channel
  globs (tail c-bullpen, tail c-recruiting) for narrower focus.
  "What to point at during the demo" callouts now name the exact
  tail command to run.
- getting-started: replace tail -f activation.log + curl-and-jq
  round-trip check with helix-org tail. Keep activation.log as a
  parenthetical for debugging the worker's internal claude stream.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…h Transport extensibility

## Abstraction Simplification

- **Channel → Stream**: Unified the Channel concept into Stream, removing redundant abstraction. Streams now hold the single named pub/sub channel.
- **Stream → Subscription**: Renamed the worker-channel edge from Stream to Subscription using a composite key (worker_id, stream_id). This eliminates synthetic stream IDs and clarifies the semantic: a subscription is a worker's interest in a stream, not the stream itself.
- **Transport Field**: Added optional Transport field to Stream to support future integrations (Slack, email, webhook, RSS, tick). Defaults to "local" (in-process pub/sub). Designed to be extensible without core changes.

## Architecture Changes

### Domain Layer (domain/)
- Added `transport.go`: Transport struct with Kind (enum) and optional Config (json.RawMessage)
- Added `subscription.go`: Subscription struct with WorkerID, StreamID, CreatedAt (composite key, no synthetic ID)
- Updated `stream.go`: Renamed from Channel; now holds ID, Name, Description, CreatedBy, CreatedAt, Transport
- Updated `event.go`: Changed ChannelID field to StreamID
- Updated `id.go`: Removed ChannelID type

### Store Layer (store/sqlite/)
- Added `subscription.go`: Subscriptions repository with Create, Delete, Find, ListForWorker, ListForStream
- Updated `stream.go`: Renamed from channel.go; added TransportKind and TransportConfig columns
- Updated `event.go`: Changed column references from channel_id to stream_id; JOINs on subscriptions instead of streams
- Updated `streams_and_events_test.go`: Renamed from feed_and_channels_test.go; comprehensive test coverage for new abstractions
- Updated `store.go`: Renamed Channels → Streams; replaced Streams → Subscriptions

### Broadcast & Dispatch (broadcast/, dispatch/)
- Renamed all channelID references to streamID throughout
- Updated method signatures to use StreamID instead of ChannelID

### Tools Layer (tools/)
- Added `create_stream.go`: New tool taking optional transport argument
- Added `read_events.go`: Replaces read_feed.go; queries subscriptions then long-polls streams
- Added `read_*.go` (streams, grants, positions, roles, workers): MCP tools replacing HTTP read endpoints
- Updated `subscribe.go`, `unsubscribe.go`, `publish.go`: Use streamId and Subscriptions API
- Renamed `channel_members.go` → `stream_members.go`; calls Subscriptions.ListForStream
- Updated `spawner.go`: Trigger struct uses StreamID; updated event notification text

### Server & HTTP (server/)
- Moved all read endpoints to MCP tools; `/workers/{id}/mcp` now handles mutations only
- Updated `tail.go`: Long-poll attributes renamed to streamID; calls store.Streams.List
- Simplified `server.go`: Only MCP mutation handler and tail endpoint remain
- Deleted: bootstrap.go, channels.go, environment.go, feed.go, grants.go, positions.go, roles.go, workers.go

### Bootstrap & CLI (bootstrap/, cmd/)
- Updated default tool grants to reference new tool names
- Updated vocabulary throughout: c- prefix → s- prefix for stream IDs

### Demos (demos/)
- Updated all demo READMEs and role definitions from channel to stream vocabulary
- Added `mlops-newsletter/hire.txt`: Example hire prompt

## Benefits

1. **Clearer semantics**: Stream is what it says (a named pub/sub channel), Subscription is the worker's interest in it
2. **Extensibility**: Transport field allows future integrations without core changes
3. **Reduced complexity**: No synthetic stream IDs, no redundant Feed/Channel/Stream layers
4. **MCP-first design**: All mutations now routed through MCP, read endpoints are MCP tools
5. **Smaller server surface**: HTTP endpoints only for authentication + tail streaming

## Testing

All 57 test cases pass with race detector enabled across all packages:
- domain: Subscription and Transport validation
- store/sqlite: Subscriptions repository operations, stream queries with JOINs
- broadcast: Pub/sub with streamID
- server: Tail long-poll with stream glob matching
- tools: All 13 MCP tools with varied schemas

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `/tail` HTTP long-poll endpoint and `helix-org tail/prompt/client`
CLI subcommands are now unnecessary: all human observation and
orchestration flows through MCP via `claude` sessions directly.

**Removals:**
- Delete server/tail.go (HTTP long-poll handler)
- Delete server/jsonapi.go (only used by tail)
- Delete cmd/helix-org/tail.go (CLI client)
- Delete cmd/helix-org/prompt.go (spawner stub)
- Delete cmd/helix-org/client.go (envelope types)
- Remove mux route for GET /tail
- Remove Broadcaster.SubscribeAll/UnsubscribeAll (dead after tail removal)
- Simplify serve/bootstrap doc: "one HTTP endpoint: /workers/{id}/mcp"

**Updates:**
- demos/getting-started/README.md: replace helix-org tail with claude
  watcher prompt using subscribe + read_events(wait=60)
- demos/mlops-newsletter/README.md: same pattern
- demos/newsroom/README.md: same pattern, plus add recruiter role
  "On hire" trigger to handle stream race condition
- CLAUDE.md: clarify that human observation uses MCP (no /tail endpoint)
- tools/publish.go: comment fix

**Fixes:**
- cmd/helix-org/bootstrap.go: make installClaudeMCPEntry idempotent
  by removing stale entry before adding (re-running bootstrap between
  demo wipes no longer fails)
- demos/newsroom/roles/recruiter.md: add "On hire" subscribe + retry
  guidance matching researcher/journalist (Renée was getting hired
  before Maya's hire activation created s-recruiting)

All three demos tested end-to-end: bootstrap → scaffold → hire cascade
→ event publishing → role live-edit → behavior change confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add helix-org chat — an interactive claude session pointed at a Worker's
MCP endpoint (default w-owner). Supports --new, --resume, --worker flags,
and session persistence via claude's per-cwd store with --continue.

Update all three demos to show only the interactive chat flow:

- getting-started: condensed from two-terminal to one, removed
  --install-claude-mcp, Bootstrap → chat → type prompts as w-owner
- mlops-newsletter: removed separate watcher terminal, team setup and
  brief publishing now happen inline in chat
- newsroom: removed multi-terminal watcher, all interaction happens
  in the bootstrap + chat session

Demos now focus on the actual user experience (typing into a chat)
which mirrors a real UI-based server. Removed background concepts,
multi-terminal complexity, and one-shot (-p) mode from demos.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
helix-org chat unconditionally passed --continue, so the first run in a
fresh directory exited with "No conversation found to continue" before
the user could type anything. Probe ~/.claude/projects/<encoded-cwd>/
for any .jsonl session file and only pass --continue when one exists;
otherwise let claude start fresh, which still seeds a session for the
next run to resume.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace claude's --continue flag with --resume <sessionId>, looked up
by reading the most-recently-modified .jsonl in the cwd's session
store and parsing the sessionId from its first line.

--continue rejects sessions whose log ended on certain non-user events
(e.g. an agent-name marker from a prior interrupted exit), failing
with "No conversation found to continue" even when the session is
fine to resume by ID. This blocked re-entry into chat in the demo
directories whenever a previous chat had exited mid-flight.

If no prior session exists, claude is launched without a resume flag
and starts fresh — matching the desired first-run behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two new MCP tools for worker-to-worker communication:

- dm: High-level tool bundling create_stream + invite_workers + publish
  into a single call. Creates per-pair streams with deterministic naming
  (s-dm-<sortedIDs>) so conversations reuse the same stream regardless of
  direction. Complements lower-level streaming tools with a high-level,
  autonomously-discoverable entry point.

- invite_workers: Subscribes one or more workers to a stream in a single call.
  Idempotent — re-inviting already-subscribed workers is a no-op. Enables
  batch subscription workflows without manual loop.

Both tools are granted to the owner during bootstrap and tested end-to-end
(dm stream reuse across directions, idempotency, self-DM rejection, unknown
worker rejection).

Updated demo: newsroom step 6 now uses dm instead of manual 4-step workflow,
and updated comments in publish/subscribe to point to dm as the high-level
entry point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces on-disk activation.log/jsonl files with a per-Worker activation
Stream. Assistant text, tool calls, tool results, and lifecycle markers
are now Events on s-activations-<workerID> — same primitive as every
other read in the system.

- hire_worker creates the activation Stream at hire time and subscribes
  the hiring Worker. The new Worker themselves is intentionally NOT
  subscribed (would loop the dispatcher otherwise).
- Spawner publishes one Event per atomic message segment (assistant
  text, tool_use, tool_result, system init, run result), bracketed by
  synthetic '=== activation: <trigger> ===' and '=== exit: <err> ==='
  markers. Append + Notify only — the dispatcher is skipped so per-
  message events can't re-trigger subscribed AI Workers.
- worker_log tool bundles subscribe + read_events scoped to one
  Worker's activation Stream. Mirrors the dm pattern: a friendly
  shortcut the agent can reach for from a 'show me what w-X is doing'
  instruction without knowing the stream-naming convention.

Persistence between activation runs is left to the Role: if a Worker
needs cross-run memory, the Role tells it to write to history.md and
read it back on the next activation. No system feature added.

Demos updated to showcase the new affordances:
- getting-started: step 3 uses worker_log to confirm hire activation
  finished, eliminating the cross-terminal log-watching requirement.
- mlops-newsletter: step 4 adds a peek-inside tip using worker_log.
- newsroom: adds a 'Watch a Worker work' step parallel to the dm
  step, plus a 'What to point at' bullet for fact-checker blocks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds inbound webhook support to helix-org Streams. Each Stream can declare
transport.kind="webhook"; POST requests to /webhooks/<streamID> append the
request body as an Event, trigger the dispatcher to wake subscribed Workers,
and notify long-poll observers.

Key changes:
- domain/transport.go: add TransportWebhook kind with docstring
- server/server.go: add Dispatcher interface, update New() signature
- server/webhook.go: HTTP POST handler for /webhooks/{streamID}
- server/webhook_test.go: 9 test functions covering edge cases and concurrency
  * happy path, missing stream, wrong transport, empty body
  * size limits, nil broadcaster/dispatcher, UTF-8 handling
  * 25 concurrent POSTs, stream isolation
  * race-detector clean with -count=20

Also fixes critical :memory: SQLite concurrency bug:
- store/sqlite/sqlite.go: pin MaxOpenConns(1) for in-memory databases
- Root cause: each connection gets its own private :memory: DB
- Impact: concurrent HTTP tests now see consistent state

New demo:
- demos/webhook/README.md: 5-step specification (hire secretary, POST payload, read back)
- demos/webhook/roles/secretary.md: secretary subscribes to s-inbox, summarizes
  incoming payloads, DMs summaries to owner

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the webhook transport so a Stream can be configured to POST
each appended Event to an external URL. A Stream can now be inbound-
only (current behaviour, no config), outbound-only (config sets
outbound_url), or both at once — the dispatcher fires emit on every
append regardless of origin (webhook handler, publish tool, dm tool).

Key changes:
- domain/transport.go: WebhookConfig type with OutboundURL field;
  Validate now parses webhook config and rejects non-http(s) URLs,
  relative URLs, and empty hosts before stream creation
- dispatch/dispatcher.go: emitOutbound runs on every Dispatch, looks
  up the Stream's transport, and if outbound_url is set fires an
  async POST with X-Helix-Stream and X-Helix-Event headers; bounded
  by 5s timeout so slow targets don't stall publishes
- domain/transport_test.go: 14 cases covering Validate happy paths
  and rejection paths, plus WebhookConfig parse round-trip
- dispatch/dispatcher_test.go: 12 tests covering emit happy path,
  inbound-only no-emit, local-no-emit, missing stream, 4xx/5xx
  tolerance, unreachable host, slow target timeout, 25 concurrent
  emits, binary payload round-trip, malformed stored config, store
  lookup errors, and content-type/path preservation
- server/webhook_test.go: TestWebhookBridgesInboundToOutbound wires
  the real dispatcher end-to-end and proves an external POST to
  /webhooks/<streamID> bridges to an outbound POST when the same
  stream has both directions configured

Demo narrative updated: secretary now subscribes to s-inbox, DMs the
owner with the summary, and publishes the summary to s-outbox which
is configured with outbound_url. A 4-terminal flow with a local nc
catcher shows the full inbound -> summarise -> outbound bridge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds domain.Message — a transport-agnostic envelope (From, To, Subject,
Body, ThreadID, InReplyTo, MessageID, Attachments, Extra) — and migrates
every event-producing path to encode it as JSON in Event.Body. There is
one storage shape going forward; future transports (email, Slack,
queues, feeds) translate at their boundary, Workers see the same
structure regardless of source.

Identity convention: From/To carry transport-native identifiers
verbatim (WorkerIDs when known, alice@x.com / U0123 / +15551234 / etc.
otherwise — no prefixes). Empty From means "no human originator" for
data feeds and triggers.

Code changes:
- domain/message.go: Message + Attachment types, Encode/Decode helpers,
  Event.Message() parser, NewMessageEvent constructor
- tools/dm.go: produces Message{From: caller, To: [recipient], Body}
- tools/publish.go: accepts optional to/subject/threadId/inReplyTo/
  messageId/bodyContentType/attachments args; defaults From=caller
- server/webhook.go: wraps inbound POST bodies into Message{Body: raw}
- tools/spawner.go: activation log entries wrapped as Message{From:
  workerID, Body: line}; Trigger gains a Message field
- dispatch/dispatcher.go: parses Event.Body once, passes parsed
  Message and visible Body text to the spawner
- tools/read_events.go: surfaces Message.Body as `body` (visible text)
  and the full envelope as `message` — Roles needing structure read
  the latter; existing role prompts that read `.body` continue to work

Tests updated to use Event.Message() instead of comparing raw Body
strings; full make check passes (lint clean, race detector clean).

Demos verified end-to-end after the refactor:
- getting-started: hire echo worker, publish "hello", echo replies,
  live-edit role, "loud: HELLO" — all four steps green
- webhook: secretary summarises inbound POST, DMs owner, publishes to
  s-outbox, outbound emitter POSTs Message JSON to nc:9000 catcher
  (catcher now sees structured envelope, not raw text — README
  updated to describe this)
- mlops-newsletter: full editor → researcher → journalist → editor
  cascade produces a complete newsletter on s-newsletter
- newsroom: 7 roles, 2 positions, 2 hires (Maya + Renée), all
  activations clean — message machinery validated without running
  the real-PR cascade

Design doc at design/messages.md captures the convention, the per-
transport mapping table for future transports, and open questions
to resolve as new transports ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the email transport, the operational-config infrastructure
it sits on, and a runnable customer-service demo (Sam) that emails
land at and reply through.

Verified end-to-end: simulated inbound POST → +sam alias routed →
Sam's claude activation → reply published to s-support → outbound
emit POSTed to Postmark's /email API → real email delivered to
phil@winder.ai. ~22s wall-clock end-to-end on a cold activation.

Operational config (design/config.md):
- New configs table (key/value/audit), store.Configs interface,
  sqlite impl. Auto-migrated alongside the rest.
- config.Registry: subsystems Register a Spec (type, default,
  required, secret paths, description). Reads/writes go through it
  so the CLI's view matches what consumers actually consume.
- helix-org config CLI: set/get/list/delete. Opens the SQLite file
  directly (same path as bootstrap), so config writes commit and
  the running server picks them up on its next read — live updates
  without restart, and without an LLM ever touching the values.
  Secrets redacted by default; --reveal-secrets opts in.
- Strict separation: org-graph mutations stay on MCP; operational
  config (transport creds, future model selection, etc.) is
  CLI-only. Same SQLite file, two access paths, two threat models.

Email transport (transports/postmark):
- domain.TransportEmail kind + EmailConfig{Alias} stream config.
  Validate enforces lowercase alphanumeric/dash/underscore aliases
  so they compose safely into <hash>+<alias>@... or <alias>@Domain.
- Inbound HTTP handler at /email/postmark: parses Postmark's JSON,
  extracts the +alias suffix from OriginalRecipient, finds the
  matching Stream by alias, builds a domain.Message envelope (From,
  To, Subject, Body, MessageID, InReplyTo, ThreadID from headers,
  Attachment metadata), appends the event, fires the dispatcher.
- Outbound emitter: when a Worker publishes to an email Stream, the
  dispatcher invokes the transport's Emit, which composes a
  Postmark /email POST (From=server-config, To from Message.To,
  optional Reply-To at <hash>+<alias>@... for threading,
  In-Reply-To/References headers when set).
- Server-level config (token, inbound, from, optional
  disable_reply_to) lives in transport.postmark; per-stream
  config is just {"alias":"sam"}. The transport joins the two at
  runtime, so rotating creds is one CLI call with no restart.
- disable_reply_to flag: workaround for Postmark's pending-approval
  same-domain restriction (Reply-To at inbound.postmarkapp.com is
  treated as a cross-domain recipient and blocks the send). With
  it on, outbound works but customer replies won't loop back into
  helix until the account is approved — documented in the demo
  README as the path to closing the loop.

Dispatcher loop guard:
- Skip outbound emit when event.Source == "" (system-emitted, i.e.
  inbound from this transport's own webhook). Without this, a
  bidirectional Stream (one alias, both inbound and outbound) would
  echo every inbound message straight back out to itself.
  Worker-published events (Source != "") still emit normally.
- Replaced TestWebhookBridgesInboundToOutbound with
  TestWebhookInboundDoesNotEcho to lock the new behaviour in.

Server:
- Server.Handler now takes optional Routes so transports can mount
  their own inbound endpoints without server.go importing them. The
  email transport's /email/postmark gets mounted from cmd/helix-org/serve.go.

Demo (demos/email):
- README.md walks through the whole flow: signup → server token →
  Sender Signature → inbound hash → cloudflared/ngrok tunnel →
  Postmark InboundHookUrl → helix-org config set transport.postmark
  → bootstrap → hire Sam → send a real email. Includes the
  pending-approval workaround and the path to closing the
  customer-reply loop once approved.
- roles/customer-service.md: Sam reads inbound, drafts a 2–4
  sentence reply, escalates rather than fabricates, signs off
  '— Sam' on its own line.
- workers/sam.md: identity stub (real first name, no brand voice,
  knows when he doesn't know).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the email demo to show two workers — customer service
(Sam, alias=sam) and engineering (Lee, alias=engineer) — handling
a customer query that requires escalation. Every leg of the
four-hop cascade goes through Postmark; both Streams are
bidirectional; threading via Message-Id stitches the whole thing
into one logical conversation.

Verified e2e in ~2:15 wall-clock:

  customer → Sam   (Postmark inbound  → s-support)
  Sam → Lee        (Postmark send + inbound → s-engineer)
  Lee → Sam        (Postmark send + inbound → s-support, [eng] prefix)
  Sam → customer   (Postmark send → real inbox)

Three Postmark sends, all returned status=200; same ThreadID flowed
through every event.

Changes:
- demos/email/roles/customer-service.md: Sam now branches on
  Subject. `[eng]` prefix means Lee replied → walk s-support
  history by ThreadID to find the customer's original query, then
  reply to that customer with a paraphrased version of Lee's
  answer. Otherwise it's a customer query → answer directly when
  simple, forward to <hash>+engineer@inbound.postmarkapp.com when
  technical. ThreadID preservation is critical for the lookup.
- demos/email/roles/engineer.md (new): Lee subscribes to
  s-engineer, drafts 3-6 sentence technical answers, replies to
  Sam at the +sam alias with `[eng] Re:` subject prefix and
  preserved ThreadID.
- demos/email/workers/lee.md (new): identity stub.
- demos/email/README.md: rewritten "Run the demo" section for the
  two-worker flow. Adds an explicit `<INBOUND_HASH>` sed
  substitution step (workers know each other's addresses via
  role text). Drops the disable_reply_to workaround now that the
  Postmark account is approved. New "What this shows" bullets
  call out workers-as-email-participants and ThreadID-as-spine.
- demos/email/demo.cast: re-recorded asciicast of the four-hop
  cascade.

The mp4 (demos/email/demo.mp4) is regenerated locally but stays
gitignored, same convention as demos/getting-started/demo.mp4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the activation prompt only carried Body. The Worker had to
call read_events to learn Subject, From, ThreadID, Extra — exactly the
round-trip that caused the docs-engineer to misroute issue #3 to PR #2
during the github demo's E2E run.

renderTrigger now formats every populated envelope field into the
prompt, omitting empties for cleanliness. The Trigger.Body field is
dropped; callers pass the full Message instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub POSTs to a single /github/webhook endpoint; the transport
HMAC-verifies via X-Hub-Signature-256 against the installation's
webhook_secret, then fans the delivery out to every Stream whose
Config.Repo matches repository.full_name and whose Config.Events
whitelist contains the X-GitHub-Event header value.

Inbound only — acting on a repo (label, comment, review, open PR) is
the Worker's job via gh in its Environment. publish on a github stream
returns a loud error rather than silently no-op'ing.

The Message envelope is mapped from the upstream payload verbatim:
Subject = issue/PR title, Body = body, ThreadID = "#<number>",
MessageID = X-GitHub-Delivery, From = sender.login, Extra = the full
payload with one synthetic top-level "event" key injected from the
X-GitHub-Event header so Workers can branch on event type from Extra
alone.

Per-stream config is just routing identity (repo, events). Provider
credentials (token, webhook_secret) live in server-level config under
transport.github with both fields registered as Secrets so config get
redacts them. Regression tests pin both names against silent leaks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walkthrough demo of the doc-engineer role: spin up a real cloudflared
tunnel, register the webhook, hire the Worker, then exercise the
issues + pull_request + pull_request_review + issue_comment paths
against a live GitHub repo. README narrates each step; demo.cast is
the asciinema recording.

Design doc covers the identity model (no machine user; gh auth token
gives the engineer the operator's own identity for now), the inbound-
only decision, the message envelope mapping, and the operational
config / setup-via-chat flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move Role.Content and Worker.IdentityContent from disk-based markdown files
(role.md, identity.md) into the SQLite domain, enabling future evolution to
remote workspaces and eliminating hardcoded filename coupling.

## Key changes

- Domain: Worker interface now exposes IdentityContent() string method; both
  HumanWorker and AIWorker carry immutable identity field. Constructor signatures
  updated to accept identity content at hire time.
- Store: Added Update(ctx, worker) method to Workers interface, implemented via
  GORM with identity_content column in worker table.
- Tools:
  - update_role: Simplified to single DB write (removed 50-line fanOut loop).
  - update_identity: New tool, mirrors update_role's shape.
  - hire_worker: Creates DB records only; no env files at hire time.
  - spawner: Added projectEnv() function that lazily writes role.md, identity.md,
    agent.md to env at activation time, reading from DB.
- Bootstrap: Seed owner Worker with starter identity text; grant UpdateIdentityName.
- UI: Added /ui/org org-chart master-detail view. handleOrgIdentitySet() now
  calls Workers.Update() instead of WriteFile(). Removed disk path tracking.
- Tests: Updated 12+ call sites with identity parameter; rewrote
  TestUpdateRoleFanOut as TestProjectEnvWritesCanonicalState to verify
  lazy-projection contract.

## Why

Hardcoded filenames across hire_worker, tools, spawner, and UI meant the system
could not evolve to support remote workspaces or other workspace configurations.
Making the DB the source of truth and performing projection at activation time
(not at hire time) lets future work extend to remote/ephemeral environments
without changing tool or bootstrap logic.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…ahead

Add MCP prompts — server-defined slash commands gated by tool grants:

- New prompts package: Prompt interface, Registry (mirrors tools.Registry),
  and builtins (Role and Help).
- /help: Self-introspecting command that walks the registry at render time
  and produces a markdown list of every other prompt. Adding a new prompt
  automatically lights it up in /help without touching this file.
- /role: Drafts a new Role from a title hint, expands to full interview
  template, saves via create_role, then offers edits or chains to hire_worker.
- Server-side expansion in chat bridge: SendHandler intercepts inputs
  starting with /,expands them from template before sending to claude.
  User sees original input in their bubble.
- Chat typeahead: CommandsHandler (POST /ui/chat/commands) renders
  matching prompts as HTML buttons on every keyup. Clicking fills the
  textarea and focuses it.
- Enum schema constraints: WorkerKind and TransportKind now surface as
  enums in JSON Schema so MCP clients see valid values in tool input
  autocomplete.
- Self-documenting validation: WorkerKind.Validate() formats errors as
  'unknown worker kind "foo" (valid: "human", "ai")' so clients can
  self-correct without reading source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… polling, and tool visibility

Major changes:

- **Prevent cascading AI-worker activations**: Added SourceKind classifier (human/ai) to Trigger;
  workers now deprioritize or skip AI-origin events per agent.md discipline rules. Dispatcher
  skips self-reactivation on publish. Tests pin self-skip and source_kind behavior.

- **Fix SSE newline rendering**: Split markdown fragments across multiple `data:` lines (SSE
  spec compliant) instead of collapsing newlines. Browser's EventSource rejoins with \n,
  preserving fenced code blocks and list formatting.

- **Add markdown rendering**: Integrated goldmark for safe HTML rendering of Role/Activity text.
  Added .md CSS class for styling (lists, code, links, headers, blockquotes). Goldmark runs
  in safe mode; raw HTML is omitted (not escaped). Tests verify bold/lists/code/headings render
  and <script> tags are dropped.

- **Real-time polling UI**: Added htmx polling (every 5s) to org chart, streams list, and
  events feed. Fixed htmx attribute inheritance breaking child click handlers by adding
  hx-disinherit="*" on poll parents. Implemented unified all-streams firehose when no stream
  selected.

- **Tool grant visibility**: Org detail now shows each Worker's granted tools as alphabetically-
  sorted chip badges. Schema exposes MCP tool names; UI surfaces them without requiring a
  separate tools query.

- **System prompt templates**: Moved agent.md and owner_role.md to embedded templates so
  content can be edited via /ui/org and doesn't require code changes. Agent.md teaches AI
  workers that human constraints don't apply and defaults to action. Owner role teaches
  delegation, polling pattern, and stream subscription during hiring.

- **Hiring playbook refinement**: Updated role template to instruct on stream provisioning:
  list_streams → create if missing → subscribe. Emphasized "Worker without streams is
  half-hired."

- **Title selection priority**: Sessions now track separate ai-title events and prefer them
  over user input for recents display (custom > ai-generated > fallback).

- **Model/effort defaults**: Changed claude.model default to "sonnet" for cost predictability;
  added claude.effort default "low" to minimize extended-thinking budget. Both configurable
  via registry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… docs

- Update 'make run' to automatically invoke 'helix-org serve' with sensible defaults (./envs, ./helix-org.db, :8080) rather than bare 'go run'
- Enhance 'make clean' to kill running servers and remove local state (DB, envs) in one command
- Improve CLAUDE.md to document these defaults and explain when/why to use each target
- Clarify that ad-hoc 'go' commands should be avoided in favor of make targets to ensure consistent build/test environment

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
The dispatcher now coalesces events that arrive while an activation is
running, passing them to the Spawner as a single batched []Trigger
instead of spawning N separate claude processes. This collapses webhook
cascades (e.g. five GitHub events from a worker's own action against a
shared auth token) into one follow-up activation.

Implementation:
- Spawner signature: trigger -> []Trigger
- Dispatcher: per-worker queue (pending slice + running flag) replaces
  per-worker mutex. enqueue() appends and starts runner if needed;
  run() drains queue in a loop until empty, calling spawner once per
  drain with the accumulated batch.
- buildPrompt() renders multiple triggers as [1/N], [2/N], etc. when
  there's more than one, so agents see them as a numbered list.
- New test proves coalescing: block first activation, publish 3 more
  events, release -> expect [e-1] then [e-2, e-3, e-4], not 5 separate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The github-engineer demo includes:
- Full README with prerequisites, setup steps, and teardown instructions
- Runnable end-to-end example of a software engineer worker on GitHub
- Role documentation for handling task lifecycle, review feedback, and board state

Updates to prerequisites:
- Document required gh token scopes (project, read:project)
- Document port availability requirement for helix-org server
- Add instructions for creating and linking a GitHub Project v2 board

Updates to software-engineer role:
- Add dm tool to MCP surface (was: subscribe, read_events)
- Add constraint: escalate setup-level problems to owner via DM instead of failing silently
  (covers: gh auth issues, missing board, repo unreachable, missing tools, discovery failure)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ession reuse

End-to-end working chat + dispatcher → Helix zed_external desktops with
the org-graph MCP attached. Each Worker (human or AI) gets its own
project + agent app + git repo at hire time; new activations reuse the
same long-lived chat session so follow-ups complete in seconds instead
of paying a 3-minute cold-start every turn.

Key fixes that came out of debugging against app.helix.ml:

- HelixProjectApplier creates a Helix-internal git repo, seeds it with
  a README so `main` exists, creates the `helix-specs` branch, and
  pushes role/identity to `workers/<id>/.context/` on that branch.
  The desktop's startup script then materialises the helix-specs
  worktree at `~/work/helix-specs/` automatically.
- Project-apply does NOT auto-create a repo; without one the desktop's
  startup script bails with "No repositories were cloned successfully"
  and Zed never launches.
- StartChatRequest now sends `app_id` so `session.ParentApp` is set —
  Helix's external MCP proxy bails with "session has no associated
  agent" otherwise, and Zed never sees the helix MCP.
- StartChatRequest sends `organization_id` (Helix doesn't auto-populate
  it from project_id; without it desktop quota falls back to the
  personal-org limit of 2).
- Streaming-aware StartChatWithStatus: reads the SSE response, returns
  the session ID + a flag indicating whether the WS-not-ready race
  fired. Detached upstream context so the request survives past the
  caller's request ctx closing.
- warmupAndRetry (chat bridge) and warmupSession (spawner) re-POST the
  same prompt every 8–20s until the dispatch lands. Helix's
  waitForExternalAgentReady checks connections globally, so the wait
  passes immediately when other users have desktops up; the per-session
  sendCommand then fails fast and Helix marks the interaction error
  (auto-wake won't recover state=error). The retry pattern absorbs
  the race client-side.
- Spawner reuses worker.HelixSessionID() across activations. Each
  fresh session spawns a fresh container; reuse keeps it warm.
- Owner-role hiring playbook updated: hire_worker MUST include
  `grants` matching the Role's Tools section. The MCP tool list is
  frozen by Helix's external-MCP-proxy cache for the lifetime of the
  first session, so granting later means the Worker can't see the
  tools until session restart.
- Runtime switched from claude_code → zed_agent. claude_code talks
  directly to Anthropic and needs an API key wired into the container
  (which we don't); zed_agent routes inference back through Helix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n agent

Live role-edit (update_role) now propagates to running Workers without
requiring a session restart:

- HelixProjectApplier.Ensure no longer early-returns on the fast path
  before pushing files. The expensive ApplyProject / CreateGitRepo /
  AttachRepo steps still skip when the project exists, but
  agent.md / role.md / identity.md are re-pushed to the helix-specs
  branch on every Ensure call. CreateBranch and PutFile are idempotent
  and cheap, so the cost is two HTTP calls per activation.
- Spawner activation prompt (helixSpecsMandate) now ALWAYS runs
  `git pull --ff-only origin helix-specs` at the start of every
  activation (fall-through to `git worktree add` only when the worktree
  is missing). Without this, the agent reads the worktree's stale
  on-disk copy and the new role text never takes effect.
- Activation prompt now also reads `.context/agent.md` first as the
  org-wide entrypoint, then role.md, then identity.md.
- AgentMD threaded through HelixSpawnerConfig and HelixProjectApplier
  so the spawner+chat-backend both seed the org policy on apply.

Validated end-to-end via demos/getting-started:
  publish hello → echo: hello (initial role)
  update_role r-echo → "loud: <BODY UPPERCASED>"
  publish hello → loud: HELLO ← live-edit takes effect

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d channel discipline

- Add **On anything else. Stay quiet** block (required in every Role) to establish
  default behavior: don't post unless a trigger above matches and output is something
  a human asked for.
- Require explicit output channel per trigger (`Post to s-{channel}` or "no post").
- Add constraint requiring workers to name the trigger before acting, enabling
  audit-log inspection and forcing commitment to a frame.
- Clarify drafting instructions so LLM-generated Roles include these elements.

This addresses the "chatty colleague" failure mode at the template level: models now
have explicit permission boundaries and must name their reasoning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@philwinder philwinder force-pushed the feat/helix-org-prompt-driven-mcp branch from 2386a28 to 284d86b Compare May 4, 2026 09:43
philwinder and others added 13 commits May 6, 2026 17:22
- Move Helix-specific Worker fields off domain.Worker into a sidecar
  WorkerRuntimeState store keyed on (workerID, backend, key). Drops
  six methods from the domain interface and isolates per-runtime
  pointers behind typed helpers in agent/helix/state.go.
- Move the runtime layer out of tools/: new agent/, agent/claude/,
  agent/helix/ packages plus helix/helixclient/ (was tools/helixclient/).
  tools/ now holds only org-graph MCP tools and Deps.
- Rename SpecsPublisher -> agent.WorkspaceSync. Logical-name contract
  ("role.md", "identity.md"); each backend translates to its own
  layout (claude: <envsDir>/<wid>/<name>; helix:
  workers/<wid>/.context/<name>). Fixes the prior path mismatch where
  update_role wrote job/* but the activation mandate read .context/*.
- Move agent.md from tools/templates/ to agent/policy.md and embed as
  agent.Policy so both runtimes share one source.
- Unify session shape: helix.Runtime ("zed_agent") and helix.AgentType
  ("zed_external") are non-configurable constants used by every
  project apply and every /sessions/chat post. Drops chat.agent_type
  config key and the SpawnerConfig.Runtime / ProjectApplier.Runtime
  fields so the spawner and chat backend can no longer drift to
  claude_code.

Verified end-to-end against app.helix.ml: getting-started demo (hire
echo, publish hello, echo: hello, live update_role, loud: HELLO).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New demo: operator raises NCR on shop floor → agent fans out to
supervisor (Slack), customers (SMS), supplier (email held) → supervisor
approves containment in one DM → agent confirms and kills/sends supplier
email based on approval text. Shows the hold pattern and the split
between agent (glue) and human (decisions).

Verified end-to-end against app.helix.ml with comms-demo container.
Three channels (email/slack/sms), two activations, ~90 seconds on stage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat agent was creating s-ncr-raised with the default transport
(local) because the hire prompt said "no config" — leaving it
ambiguous whether the transport itself was needed. Symptom on stage:
POST /webhooks/s-ncr-raised → 404 "is not a webhook stream".

Three changes:
- Hire prompt now spells out the create_stream JSON for every
  stream and explicitly says do not omit the transport field.
- Adds a smoke-test curl after hire that fails fast if any stream
  is misconfigured.
- Adds the local-transport failure mode to the Recovery table with
  the verbatim fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat agent kept guessing wrong on transport.kind ("webhook",
"incoming-webhook", and {"kind":"webhook","direction":"in"}) because
the JSON schema exposed kind as a plain string with no enum and no
description. We already had a TransportKind enum surfacer wired up
in tools/schema.go — but createStreamTransport.Kind was typed as
string, not domain.TransportKind, so the enrichment never applied to
this schema.

- Retype createStreamTransport.Kind to domain.TransportKind so the
  existing enum-and-description enrichment kicks in.
- Beef up the tool's Description with the valid kinds and a webhook
  example for clients that don't render enum constraints.

Verified: schema now exposes
  enum: ["local", "webhook", "email", "github"]
and bad kinds are rejected with the existing self-documenting error
("valid: \"local\", \"webhook\", \"email\", \"github\"").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
create_stream's schema now surfaces transport.kind as
enum: ["local","webhook","email","github"] with a description, so
the hire prompt no longer has to defend against the agent guessing
"incoming-webhook" or omitting the transport entirely.

- Trim the "do not omit transport" guardrail and the post-hire
  get_stream verification step — both were workarounds for the
  schema gap, now closed.
- Add a note to always pass `chat --new` after rebuilding the
  binary; chat-driving claude caches MCP tool schemas at session
  start and won't see new enum constraints without a fresh session.
- Soften (don't remove) the local-default Recovery row: stale chat
  sessions on a fresh binary can still hit it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schema now exposes the valid transport kinds, so the prompt no
longer needs literal JSON arguments — describing the streams in
words is enough for the agent to call create_stream correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smaller chat models reliably collapse the canonical
{"transport":{"kind":"webhook"}} object to its discriminator string
{"transport":"webhook"} once they've seen the kind enum on the
schema, then watch the call fail with a JSON-unmarshal error and
loop. Both shapes are unambiguous and mean the same thing — accept
both.

- Custom UnmarshalJSON on createStreamTransport handles either form.
- Schema declares transport as a oneOf [enum-string, object] so
  strict-validating MCP clients accept the shorthand too.
- Tests cover both input forms and the schema shape.

Verified live: create_stream with transport:"webhook" produces a
stream with transportKind:"webhook"; the object form still works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat backend (chat.backend=helix) runs the chat-driving agent
inside a Helix sandbox that does NOT have this repo checked out.
Telling it to "read ./demos/manufacturing/roles/quality-bot.md" is
a dead instruction — the file isn't there. The Zed agent then
spirals through every other tool it has trying to find context:
kodit_repositories, kodit_wiki, kodit_grep, curl on localhost:9876,
ls on the helix-specs branch, etc.

Fix: paste the entire role markdown inline in the hire prompt so
the agent has zero reason to fetch anything from the filesystem.
Add explicit "Use ONLY the helix-org MCP tools, do NOT read files,
do NOT use kodit, do NOT curl URLs" steering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pointer schema arrived as Types:["object","null"]; setting Type
without clearing Types produced an invalid jsonschema (both Type and
Types non-zero is a marshal error), which broke MCP tools/list at
session start and starved Claude of every helix-org tool.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bare `helix` pattern matched any directory named `helix` at any
depth, which was silently swallowing helix-org/helix/ and
helix-org/agent/helix/ — entire packages (helixclient, spawner,
project applier, runtime state, workspace) sitting in the working tree
but never reaching git. The original intent was to ignore the `helix`
binary at known cmd paths; anchor it there so the helix-org subtree
becomes trackable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vider

Adopts #2375 (the durable session-scoped message queue)
and #2399 (cold-start dev-container wake) so helix-org no
longer needs to fight the framework with a client-side warmup loop.

## helix client

- New `SendSessionMessage(ctx, sid, content, opts)` posts to
  /api/v1/sessions/{id}/messages — Helix persists the interaction and
  pickupWaitingInteraction delivers it once the agent's WS is reachable.
  Returns 200 even when no agent is connected.
- New `ListProviders` and `ListModelsForProvider`, plus a
  `ValidateProviderModel` helper that checks chat.provider /
  chat.model against the live Helix instance. We hit /v1/models with
  the provider query string (the bare aggregate endpoint excludes
  Anthropic and is unreliable).

## Spawner refactor (agent/helix/spawner.go)

- Follow-up activations queue via `SendSessionMessage` — no StartChat
  round-trip. 290ms instead of 7s+ on a warm session.
- First activations still use `StartChat` to create the session; on the
  cold-start `hadWSError` race we re-queue the same prompt via the
  durable endpoint instead of polling for up to 5 minutes.
- Drops `warmupSession` (~40 lines).
- New tests: `TestSpawnerFollowUpUsesSendSessionMessage` (asserts no
  StartChat on follow-up) and `TestSpawnerColdStartReQueues` (asserts
  the hadWSError → queue handoff).

## Chat-bridge refactor (server/chat/helix_bridge.go)

- Same two-path treatment: follow-ups via `SendSessionMessage`, fresh
  sessions via `StartChat` with cold-start fallback to the queue.
- Drops `warmupAndRetry` and the 5-minute background goroutine
  (~70 lines).
- Existing test updated to assert follow-ups go through the queue.

## Provider/model validation

- `bootstrap helix-runtime` now runs the validator after WhoAmI and
  prints the actual providers/models on failure.
- `serve` refuses to start with bad chat.provider / chat.model and
  points operators at the exact config commands to fix it.

Without this, a typo in chat.provider surfaces as a 422 from
/sessions/{id}/zed-config three minutes later when the desktop tries
to fetch its Zed config — with no obvious link back to the bad key.
The validator turns that into a fail-fast at startup.

## Verified end-to-end against meta.helix.ml

Final smoke session: ses_01kr9bcpcm9gnpr7k5y4fgjmdk
- First send → StartChat (~31s for Zed cold boot) → "pong"
- Follow-up → SendSessionMessage (347ms to queue) → response within ~10s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds CheckDesktopQuota helper that hits /api/v1/config and refuses
when max_concurrent_desktops would be exceeded by spinning up one
more session. Wired into both code paths that open *new* zed_external
sessions:

- agent/helix/spawner.go::ensureSession (AI Worker activations)
- server/chat/helix_bridge.go::send       (owner chat first turn)

Follow-ups skip the check — they reuse the warm container and don't
allocate a new desktop slot.

Without this, a quota-full Helix would let helix-org spin up the per-
Worker project plumbing (apply secrets, attach MCP, create agent app)
and only fail at the StartDesktop step with a generic 500 several
seconds later. The new error message names the actual count and
points operators at the fix:

  desktop quota reached on Helix (3/2 active) — stop one of the
  existing sessions before opening a new one

The check is soft (no atomic reserve) — a parallel caller could still
race for the last slot, in which case Helix's own quota error wins.
That's acceptable; the goal is operator clarity in the common single-
user case.

Verified end-to-end against meta.helix.ml: with active=3/max=2, send
returned 500 + actionable message in 289ms; after stopping two
sessions (active=1), the same request opened a session in 7s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant